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Abstract 

We identify the common underlying form of the capacity expression that is applicable to 
both cases where causal or non-causal side information is made available to the transmitter. 
Using this common form we find that for the single user channel, the multiple access channel, 
the degraded broadcast channel, and the degraded relay channel, the sum capacity with causal 
and non-causal side information are identical when all the transmitter side information is also 
made available to all the receivers. A genie-aided outerbound is developed that states that when 
a genie provides n bits of side information to a receiver the resulting capacity improvement can 
not be more than n bits. Combining these two results we are able to bound the relative capacity 
advantage of non-causal side information over causal side information for both single user as well 
as various multiple user communication scenarios. Applications of these capacity bounds are 
demonstrated through examples of random access channels. Interestingly, the capacity results 
indicate that the excessive MAC layer overheads common in present wireless systems may be 
avoided through coding across multiple access blocks. It is also shown that even one bit of side 
information at the transmitter can result in unbounded capacity improvement. 



*This work was presented in part at the IEEE Communication Theory Workshop, June 12-15, 2005 and at the 
Forty-third Annual Allerton Conference on Communication, Control, and Computing, Sept. 28-30, 2005. 



1 Introduction 



Characterizing the capacity of reliable communication between a transmitter and a receiver in 
the presence of side information is a problem as old as information theory itself. Starting with 
Shannon's characterization of the capacity with causal side information [1] and Kusnetsov and 
Tsybakov's seminal work [2] on coding for computer memories with defective cells that lead to a 
general characterization of capacity with non-causal side information by Gel'fand and Pinsker [3], 
the theory of communication with side information has evolved as a dichotomy between the two 
cases of non-causal side information and causal side information. Capacity results with non-causal 
side information were generalized to the case of rate limited side information by Heegard and El 
Gamal [4] and more recently by Rosenzweig et. al. [5] for single user communication and by Cemal 
and Steinberg [6] for the multiple access channel. Study of non-causal side information has lead 
to interesting duality relationships such as that between source coding and channel coding with 
side information [7] as well as the duality between the Gaussian multiple input multiple output 
(MIMO) broadcast and multiple access channels [8-10]. In addition to data storage [2,4] and 
data-hiding/watermarking/steganography [11] these results have found applications on Gaussian 
channels with additive Gaussian interference [3, 12-14] recently leading to the determination of 
first the sum capacity [9, 10, 15, 16] and then the entire capacity region for the MIMO broadcast 
channel [17-19]. 

The case of causal side information has been studied separately by researchers [1,20-23]. The 
original capacity result due to Shannon [1] requires coding over an extended alphabet of mappings 
from the channel state to the input alphabet. Caire and Shamai [21] showed that when the side 
information at the transmitter is a deterministic function of the side information available to the 
receiver, capacity achieving codes can be constructed directly on the input alphabet. The capacity 
of the time varying multiple access channel with causal side information was explored by Das and 
Narayan [22] and more recently by Kim and Sigurjonsson [23]. 

With the availability of all the above mentioned capacity results on causal and non-causal side 
information, there is a need to develop a unified view of communication with side information. A 
unified view would allow us to relate, combine and extend the existing results to new applications. 
In this paper, we approach this objective through the following questions: 

1. Common Framework: What is the fundamental connection between the capacity character- 
izations with causal and non-causal side information at the transmitter? 

2. Value of a bit of side information: What is the maximum possible capacity improvement with 
n bits of any kind (causal, non-causal, memoryless or correlated) of side information provided 
by a genie to the transmitter /receiver? 
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3. Non-causal versus Causal: What is the relative capacity advantage of non-causal side infor- 
mation over causal side information? 



4. Impact of Correlation: How does correlation, either temporal or between the transmitter and 
receiver side information, affect the capacity? 

5. Multiple users: Finally, how do the causal and non-causal capacity characterizations extend 
to multiple user channels, e.g. the multiple access, broadcast and relay channels? 

In this work we focus on both causal and non-causal side information, the relationship between 
them, and their extensions to multiuser communications. Our interest is in general capacity ex- 
pressions with finite states. Specialized expressions for AWGN or fading channels may be obtained 
as special cases from these general expressions, subject to input distribution optimizations. 



2 Background and Channel Model 



The channel is a discrete memoryless channel (DMC) with message W, input Xj, output Yi, state 
Si, transmitter side information St,i and receiver side information Srj so that P(S n , Sj,, S R ) = 
Uf =1 P(Si, St,%, S R ,i) and P(Y n \X n , S n ) = n™ =1 P(y i |X i , Si). The only difference between the causal 
and non-causal side information cases is that with causal side information the input to the channel at 
time i can only depend on the present and past states but not the future states, Xi(W, S^), whereas 
with non-causal side information all inputs can depend on the entire state sequence Xi(W, S^). 
Notice that for the receiver it does not matter if the side information is made available causally 
or non-causally. This is because the receiver can wait till the end of transmission to decode the 
message. Figure ^ illustrates the two scenarios. Probability of error, achievable rates and the 
capacity of this channel are defined in the standard sense [21,24]. 



Xi(w, s T ) 



St,, 



P(Xi\Xi, Si) 



Yi ^.W(i"»,SfS) Xi(W,S T ) 



P(.Si,S T ,i,S Rl i) 
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Yi ^W(Y n ,S R ) 



P(Si,S T ,i,S R>i ) 
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Figure 1: Channel Model with (a) Causal, and (b) Non-causal side information. 



For non-causal side information at the transmitter, the single user capacity is known to be [2-4]: 

(1) 



^non-causal = J . _ J (JJ . 

^non-causal 



where 'P non . causa i = {P(U, X\St) = P(U\St)P(X\U, St)}- Comparing this to the case where no side 
information is available (St = </>), 

C = max I(U:Y) = maxI(X:Y), (2) 
P(U,X) P(X) 

note that the availability of side information at the transmitter is helpful in that the transmitter 
can match its input to the channel information by picking the input alphabet U, X conditioned 
on St, as opposed to © where the input can not be matched to the channel state. However, the 
benefit of matching the input to the channel state comes with the cost of the subtractive term in 
(|T|). i.e., I(U ; St) which can be interpreted as the overhead required to communicate to the receiver, 
the adaptation to the channel state at the transmitter. For the case where the side information 
is available at the transmitter only causally, the capacity expression has been found by Shannon 
as [1] 

C causaX = max I (T:Y) (3) 
P(f) 

where T is an extended alphabet of mappings from the channel state to the input alphabet. 

The capacity expressions (JTJ, ©, © explicitly account for side information at the transmitter. 
Side information at the receiver, Sr, is easily incorporated into the same expressions by replacing 
Y with (Y, Sr) in the corresponding expressions. 

We start by finding a common form for the causal and non-causal cases. 

3 Relating Capacity with Causal and Non-causal Side Information 

Comparing the non-causal case © with the causal case (j3J) the two capacity expressions are in dif- 
ferent forms so that their relationship is not obvious. We start by making the relationship clearly 
apparent by representing the causal case in a different form, comparable to ([T|). The following 
result has also been observed and derived independently in parallel work by [23] and [6]. 



^non-causal = m&x I (U ;Y, S R ) - I (U ; S T ) , 

^non-causal 

C causal = max I(U; Y, Sr) - I(U; S T ), 

^causal 



with 



^no.-causa. = {P (U , X \S T ) = P (U \S T ) P (X \U , S T )} 

Pcausal = {P(U,X\S T ) =P(U)P(X\U,S t )} 
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In the non-causal case, the choice of U can be made conditional on the channel state St- In the 
causal case U is picked independent of St- This makes the subtractive term equal to zero for the 
causal case. In both cases, it suffices for the optimal input symbol X to be just a function of U, St, 
i.e. P(X\U, S T ) is either or 1. 

Sketch of Proof: [Converse] Achievability of the capacity expression with causal side information 
is straightforward. Interestingly, the converse allows a very simple proof, as follows. Starting with 
Fano's inequality, 

nR < I{W-Y n ,S n R )+ne n (4) 

n 

< ^IiW^^YuS^Y 4 - 1 ,^ 1 ) (5) 
i=i 

n 

= J2 H ^ S RA Yi '^ Sr 1 ) - H ( Y i> S^Y*- 1 , Sjf 1 , W, 4- 1 ) (6) 
i=i 

n 

< ^ J ff(y i ,s , flii )-fl-(y i ,fi , Bii |5*r 1 ,w) (7) 
i=i 

n 

= ^ /([/,; y,5^) (8) 
i=l 

where (J7J) follows because the current output is independent of the past outputs, conditioned on all 
the past inputs. Ui = (W, SJfT 1 ) is the auxiliary random variable, independent of current channel 
state St,i- ■ 
The capacity expression Q has been shown [7,25] to be the common form of single user capacity 
for all four cases of non-causal side information as well as the corresponding cases for rate-distortion. 
In other words, for the capacity problem, whether the non-causal side information is available 
at the transmitter, the receiver, both, or neither, the capacity expression has the common form 
I(U ; y Sr) — I(U; St)- The only difference is in the constraints on the distribution of the auxiliary 
random variable U, the input alphabet X and the state variable S. Thus, combining the results of 
[7,25] with the common expression obtained above we find that the expression I(U ; Y, Sr)—I(U ; St) 
is indeed the common expression for not only all cases of non-causal side information but also for 
causal side information as well. 

C—^St, Sr) = max P(c/>X | ST) /([/; Y, S R ) - I(U; S T ) 

C— l (^ S R ) = max f/= ^ P(x) I(U; Y, S R ) - I(U; S T ) = max P(x) I(X; Y, Sr) 

C~ l (S T , 0) = max P(t/iX | 5T) I(U; Y, S R ) - I(U; S T ) = max P([/jX| 5 T ) I(U; Y) - I(U; S T ) 

<f>) = maxt,^p W I(U; Y, Sr) - I(U; S T ) = max P(x) I(X; Y) 
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C causa \S T , S r ) = max P{U) p WUtST) W; Y, S R ) - I(U; S T ) = mBXp(U)P{x\u,Sr) J ( u ^ y > S n) 
C cau S ai ( ^ )) Sr) = maxu= ^ p(x) /([/ ; Y , S R ) - I(U; S T ) = max P(x) I(X; Y, S R ) 

C^(StA) = ma Xp({7)P(X | C/)5T) I(U;Y,Sr) - I(U;S T ) = max P(u)PWU>ST) I(U;Y) 
c -causai ( ^ ) 0) = maX[/=0jP(x) /([/; Y , Sr) - I{U- S T ) = max P(x) I(X; Y) 

Investigating the relationship between causal and non-causal information further, we prove that 
the two capacities are the same if the transmitter side information is also available to the receiver. 

Theorem 1 (Relationship between causal and non-causal side information capacity) If the side- 
information at the transmitter is a deterministic function of the side-information at the receiver, 
i.e., if St = f(S R ), then capacity with causal side information is equal to the capacity with non- 
causal side information. 

Capacity achieving codes, in both cases, can be constructed directly on the input alphabet and the 
auxiliary random variable U is not required. 

Proof: 

^non-causal = J (TJ y g X _ jnj g ) 

P(U,X\S T )=P(U\S T )P(X\U,S T ) 

max I{U;Sr\S t ) + I(U;Y\Sr) 

P(U,X\S T )=P(U\S T )P(X\U,S T ) 

max I(U;Y\S R ) 

P(U,X\S T )=P(U\S T )P(X\U,S T ) 

^Ycausal 

where the last equality follows from the results of [21] for causal side information. It is shown in [21] 
that with causal side information at the transmitter, if St = f(S R ), i.e., if the side-information at 
the transmitter is a deterministic function of the side- information at the receiver then the auxiliary 
random variable U is not needed and coding can be performed directly on the input alphabet X. 

Thus, if the side information at the transmitter is a deterministic function of the side information 
at the receiver, then capacity with non-causal side information is the same as the capacity with 
causal side-information. ■ 

Next we investigate the value of side information through capacity bounds. 
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4 Genie bits and the Value of Side Information 



In this section we answer the question: what is the maximum capacity gain that can result from 
a fixed number of bits of side information? For example, suppose a genie provides a fixed number 
of bits of side information G to the transmitter or the receiver per channel use. There are no 
constraints on the genie provided side information, i.e., it could be causal, non-causal, temporally 
correlated, or i.i.d. A fundamental question is whether the capacity increase, Cq — C, due to a 
fixed number of genie bits is bounded and if so, then what is the maximum capacity benefit. The 
following results show that transmitter and receiver side information are fundamentally different 
in their potential capacity advantages. 

4.1 Receiver Side Information 

Theorem 2 The maximum possible capacity improvement due to the availability of receiver side 
information is bounded by the amount of the side information itself. 



C G -C<H(Q)= lim -j-fl-(Gi,G 2 ,-" ,G N ). 

N—>co iv 



Proof: 



C G = sup lim ±-I(W;Y N ,G N ) 

p Q 7V->oo 1\ 

= sup lim i- (I(W;Y N ) + I(W;G N \Y N )) 

pQ N^OO N 

= G + AG 

where the sup is over the multi-letter distributions of allowed input strategies. Cq is the capacity 
with the side information provided by the genie, C is the capacity without the side information 
and AG, the capacity improvement, is bounded by the entropy rate H(Q). In other words, if the 
genie provides one bit of side information per channel use, the capacity benefit AG = Co — C can 
not be more than 1 bit, regardless of the kind of side information. ■ 

4.2 Transmitter Side Information 

Theorem 3 The maximum possible capacity improvement from a fixed number of genie bits (per 
channel use) of side information is unbounded, even if the side information is causal. 

Proof: The proof is in the form of the following example. Consider the channel with input 
alphabet X G X = {0, 1, 2, • • • , 2N — 1}, output alphabet Y € y = X U {</)} where <f> is the erasure 
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symbol, and i.i.d. state sequence drawn from the alphabet S £ S = {0,1}. The input output 
relationship is given by, 



Y 



X when X + S is even, 
6 when X + S is odd. 



In simple words, when the channel state is S = 0, the channel conveys even inputs noise-free and 
erases odd inputs. Similarly, when the channel state is S = 1, the channel conveys odd inputs 
noise-free and erases even inputs. The channel state S is unknown to the receiver. 

Suppose the genie provides one bit of side information in the form of G = S to the transmitter 
every channel use. With perfect knowledge of S at the transmitter, the capacity of this channel is 
clearly Cq = log(iV + 1) where ./V + 1 is the number of distinct outputs that can be affected by the 
transmitter. On the other hand, with no state information at the transmitter the optimal input 
distribution is uniform on X, the corresponding output distribution is 



Y 



with probability i 

with probability % G X 



and the capacity is 



C = I(X;Y) = H(Y)-H{Y\X) = l -\og(N)+ 1 -. 



The capacity benefit of one genie bit therefore is 

C G - C = log(AT + 1) - I log(JV) ~\>\ log(iV) - \ 

which is unbounded, i.e. goes to infinity as iV goes to infinity. Notice that Cq is obtained with 
only causal side information at the transmitter. Thus, the example shows that the capacity benefit 
of one bit of side information at the transmitter can be unbounded even when the side information 
is causal. ■ 
The contrasting potential capacity benefits of side information are summarized in Fig. [21 

While we have shown that, in theory, unbounded capacity improvement can result from a finite 
amount of causal side information at the transmitter, it is not clear how such side information and 
the associated capacity benefits can be obtained in practice. For practical systems, side informa- 
tion at the transmitter is often obtained through a feedback channel from the receiver. However, 
it is well known that for a DMC, even if all the past channel outputs obtained at the receiver are 
made available to the transmitter through a noise and delay free feedback channel, the capacity 
is unaffected. Thus, the unbounded potential capacity benefits of causal side information are in 
sharp contrast with the zero capacity benefit of causal feedback for a DMC. The key to reconciling 
these results lies in the definition of causal in both cases. While causal side information allows the 
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X 



(a) Receiver Side Info.: One genie bit cannot (b) Transmitter Side Info.: One genie bit can 
improve capacity by more than one bit. improve capacity by an unbounded amount. 



Figure 2: Value of Side Information at the Transmitter and Receiver 

transmitter knowledge of the current channel state, causal feedback allows information only about 
the past channel outputs. For a memoryless channel past outputs do not provide any information 
about the current channel state. In the example considered above, causal feedback of the channel 
outputs will provide the transmitter precise knowledge of all the past channel states, but no infor- 
mation about the present channel state. Thus, the timing of the side information at the transmitter 
can make the difference between unbounded capacity improvements and no improvement at all. 

5 Advantage of Non-Causal Side Information over Causal Side 
Information 

The preceding section shows that the present state information at the transmitter can be invaluable 
compared to the information of all the past states. The natural question then is, what is the 
advantage of knowing the future channel states over knowing just the present and past states? 
In other words, what is the capacity advantage of non-causal side information over causal side 
information? It turns out, the answer to this question is already implicit in the results of Theorems 
□ andU 

Theorem 4 The capacity benefit of non-causal side information over causal side information is 
bounded as follows: 

C nm - msal (S T , S R ) - C causal (S T , S R ) < H(S T \S R ) 
Proof: From the preceding sections we have the following two results. 

• If the transmitter side information is also available to the receiver, capacity with causal and 
non-causal side information is identical. (Theorem ^) 

• If a genie provides a bit of side information to the receiver it can not improve capacity by 
more than a bit. (Theorem^ 
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Combining these two results, suppose the transmitter side information is made available to the 
receiver by a genie. This requires H(St\Sr) genie bits and therefore cannot improve capacity by 
more than H(St\Sr) bits. Using the results stated above, we have: 

C r„on-causal^ T) g R ^ _ H ( St \Sr) < (S T , S R ) < C n °~ a \S T , (S T , S R )) 

C causal (#r, (S t , Sr)) - H(S t \Sr) < C C ^(S T , Sr) < C causal (S T , (St, Sr)) 
C—'iSr, (S T , Sr)) = C causal (Sr, (S T , S R )) 

C^- C ^\S T , Sr) - C causal (S T , S R ) < H(S T \Sr) 

In simple words, the advantage of non-causal side information over causal side information is 
bounded by the number of genie bits required to make the transmitter side information available 
to the receiver as well. 



5.1 Example: Random Access 

Consider a single user DMC characterized by P(Y\X) and with a capacity Co = maxp(^) liX'iY) 
when the input is directly controlled by a transmitter. Now, suppose instead that the transmitter 
is only able to access the channel (control the input) in a random manner as: 

X = X T S + X r (l - S), 

where S € {0, 1} is a switch state that determines when the transmitter can access the channel 
with the symbol Xt, and X r is a randomly generated input. 



User input 



Xt 



X 




Y 



X T 



Random Input 



Figure 3: Random Access Channel 

Suppose the state S is known to the transmitter and not known to the receiver. Such a channel 
is relevant to cognitive communication scenarios [26] and is also similar to the "memory with stuck- 
at defects" problem considered in [4]. Clearly, if the switch state is provided to the receiver by 
a genie the resulting capacity is C(S,S) = Prob(S* = l)Co = SCq. Since the extra information 
provided by the genie is only one bit we have 



SCn C n 



-causal 



(S, <f>) > SC 
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Interestingly, the capacity with causal side information in this case is the same as the capacity with 
no side information. This is easily seen by rewriting Xt = f(U,S) as follows: 

f f o (U),S = 0, , s 

X T = { = fi(U ),S = 0,1. 

I h(U),S = l 

In other words, the choice of input symbol does not matter when the switch is open (5 = 0). Thus, 
we have 

c causal (s, <f>) = c(<f>, cf>) > c(s, s)-i. 

The effect of memory in side information is also revealed by this example. Suppose the switch 
changes state in a block static model, i.e. it retains its state for TV symbols and then changes to 
an i.i.d. realization. In this case, the genie only needs to provide one bit to the receiver every N 
channel uses and the bounds are tighter. 

SC = C(S, S) > c non - c ^(S, 0) > C causal (5, <f>) = C(<p, <p) >SC --^ 
6 Multiple Access Channel with Independent Side Information 

Achievable regions with causal side information are straightforward to obtain because the codewords 
can always be constructed on mappings from the side information to the channel input alphabet. 
In the multiple access channel, the two transmitters have (possibly correlated) side information 
Sti,St2 respectively, and the common receiver has side information Sr. The characterization of 
the achievable region with side information is the same as without side information, with the codes 
defined on auxiliary random variables that are independent of the side information, and the actual 
channel input symbols chosen as a function of the auxiliary random variable and the instantaneous 
side information. Thus the following achievable region is obtained. 

Ri < I(U 1 ;Y,Sr\U 2 )=I(U 1 ;Y\U 2 ,Sr) 
R2 < I(U 2 ;Y,Sr\U 1 )=I(U 2 ;Y\U 1 ,Sr) 
Ri + R 2 < I(U 1 ,U 2 ;Y,Sr)=I{U 1 ,U 2 ;Y\S r ) 

where Ui,U 2 are mutually independent as well as independent of the side information and the 
channel inputs are given by X\ = fi(U\, Sri), X 2 = f 2 (U 2 , St 2 )- 

While a full converse is not known, upperbounds for the MAC with causal side information are 
obtained in [23] in terms of the capacity achieved with transmitter cooperation. In this work, we 
focus on the multiple access channel with independent side information at the two transmitters. 
We prove that the sum capacity of the multiple access channel with independent side information 
is given by the corresponding constraint in the achievable region provided above. 
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Theorem 5 The sum capacity of the discrete memoryless multiple access channel with causal side 
information Sti,St2 (Sti,St2 independent) and Sr available to transmitter 1, transmitter 2 and 
the receiver, respectively, is given by 

R 1 + R 2 = max.I(U 1 ,Ur,Y,S R ) (9) 

with P(U 1 ,U 2 ,X 1 ,X 2 ,S T i,S T2 ) = P{Ui)P(U 2 )P{X 1 \U 1 ,Sti)P{X 2 \U 2 ,S T 2)P(Sti,St2). 
Proof: The converse is proved as follows: 

n(Ri + R 2 ) < I(Wi,W 2 ;Y n ,S%) +ne 

n 

i=i 

n 

< ]T H(Y, Sn,i) - H(Y t , Sr^W^S^ 1 , W 2 , Sif 2 \ Y l ~\ S^ 1 ) + ne 
i=i 

n 

= ^2l{Ui,i,U2,i;Yi,S R> i) + ne 
i=i 

where XJ\^ = W\,S % rf^~ and U 2 ^ = W2,S T ~^ 1 are independent of Sxi,i, <St2,z- Independence of mes- 
sages and transmitter side information implies independence of the auxiliary random variables U\ , 
U 2 as well. ■ 
For non-causal side information, an achievable region is readily obtained when the side infor- 
mation at the two transmitters is independent. 



CD{(R!,R2) : Ri<I(U 1 ;Y,S r \U 2 )-I(U 1 ;Sti) 
R2<I{U2\Y,S r \U 1 )-I(U 2 -S T2 ) 
Ri+R 2 < KU U U 2 ; Y, S R ) - I(U i; Sti) - I(U 2 ; S T2 )} 

for all P(U 1 ,X 1 ,U 2 ,X 2 \S T i,S T2 ) = P(U U X^St^P^, X 2 \S T2 ). However, to the best of the 

author's knowledge a converse has not been shown for independent side information 1 . Correlation 

of the side information makes even the achievable region non-trivial, as the possibility of Slepian 

Wolf coding of correlated side information at the two transmitters can be exploited. 

For our purpose however, we show that with independent side information at the transmitters, 

if all the transmitter side information is also made available to the receiver, i.e. (Sti, St2) = f(S R ), 

then the MAC capacity region with causal side information is identical to the capacity region with 

non-causal side information. 

lr The problem with extending the single user approach appears to be that including y l_1 into the auxiliary random 
variables makes them correlated. 
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Theorem 6 For the discrete memoryless multiple access channel with side information Sti,St2 
(independent) and Sr available to transmitter 1, transmitter 2 and the receiver, respectively, if 
(Sti, St2) = /(Sr) then the capacity region for both causal or non-causal side information is given 
by the convex hull of all rates pairs (Ri,R 2 ) satisfying the following inequalities: 

R x < I(X 1 ;Y\S R ,X 2 ) 

R 2 < I(X 2 ;Y\S R ,X 1 ) 

R1 + R2 < I(X 1 ,X 2 ;Y\S R )} 

for all P(X 1 \Sti),P(X 2 \S T 2). 

Proof: [Achievability] Achievability is easily established as follows. Starting with the achievable 
region for the causal side information case, we have: 

Ri < I{U V ,Y\U 2 ,S R ) (10) 

= H(Y\U 2 ,S R )-H(Y\U 2 ,U 1 ,S R ) (11) 

= H(Y\U 2 ,S R ,X 2 ) - H(Y\U 2 ,U 1 ,S R ,X 2 ,X 1 ) (12) 

= H{Y\X 2 ,Sr)-H{Y\X 1 ,X 2 ,Sr) (13) 

= I(X V ,Y\Sr,X 2 ). (14) 

Equation (|T2*|) follows from the fact that X\ (resp. X 2 ) is a function of Ui, Sti (resp. U 2 , St2) and 
Sti (resp. Sjyi) 1S a function of Sr. Thus, X\ (resp. X 2 ) is a function of U\,Sr (resp. U 2 ,Sr). 
The corresponding inequalities for R 2 and the sum rate R± + R 2 are similarly obtained. Clearly, 
what is achievable with causal side information is also achievable with non-causal side information. 
■ 

Proof: [Converse] For the converse, we start with the individual rate constraints. For both 
causal and non-causal side information we have: 

nR 1 < I{Wr,Y n ,Sl\W 2 ) + ne (15) 

n 

= ^IiW^YilW^Y'-^^+ne (16) 

i=i 

n 

= ^H(Yi\S%,W 2 ,Y^\X 2 ,i)-H(Yi\W2,W 1 ,SZ,Y i - 1 ,X 1 , i ,X 2 J (17) 
i=i 

n 

< Y J H ( Y i\ S R^ X ^)- H ( Y ^ W ^Wi,S^,Y i - 1 ,X^,X 2ti ) (18) 

i=l 
n 

= ^ H ( Y i\ S ^ X ^)- H ( Y i\ X ^ x 2,i,S R ,i) + ne (19) 

i=l 
n 

= Y^ I i x ^\ x ^ s ^) +ne ( 2 °) 



1=1 
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Independence of side information ensures that Xn — > Sn — > S 2 ,i — * X2i form a Markov chain as 
required. 

Equation Q17JI follows from the fact that .X^j is a function of W2 and Sj| (resp. S 1 ^) for non-causal 
(resp. causal) side information. (|18|) follows because conditioning reduces entropy and because Yi is 
independent of Y^ 1 , Wi,W 2 , S 1 ^ 1 , 5£ i+ x given Sh,,, J?2,i- Note that the inequality in (|18j) is 
necessary because Yi is not independent of Y 1 ^ 1 , W2, S 1 ^ 1 , i+1 given Sn^,X2 y i- Intuitively, Y % ~ x 
contains some information about W\ and thus Xn which affects Yi. The converse for R2 follows 
similarly. Finally, for the sum rate we have: 

n(R x + R 2 ) < /(iyi,W 2 ;y n ,5^) + ne 
= I(W 1 ,W 2 ;Y n \Sl)+ne 

n 

= J2H(Yi\ S%) -H(Y l \W 1 ,W 2 ,Y l -\S%)+ ne 

i=l 
n 



< Y, H ( Y i\ S R,i)- H ( Y i\ X hi> X 2,i, S R,i)+™ 

i=l 

n 

< ^ f I(X lti ,X 2i i;Yi\S R) i)+ne 



i=l 



6.1 Genie Bits and Value of Side Information for a Multiple Access Channel 

Theorem H3 extends the single user result of Theorem ^ to the multiple access channel. By analogy 
to the single user case, a relationship between causal and non-causal side information capacity can 
be obtained by extending the results of Theorem [21 to the multiple access channel. Recall that 
in Theorems |21 and El we characterized the potential capacity benefits of side information at the 
transmitter and receiver for single user communications. Clearly, Theorem |31 extends directly to 
the multiple access channel, because if unbounded capacity gains are possible with transmitter side 
information in a single user channel, then the same must be true of the multiple access channel 
which includes the single user capacity as a special point in its capacity region. The extension of 
Theorem |2 to the multiple access channel is only slightly less straightforward as the derivation for 
the single user channel can be easily modified to the multiple access case as follows. 

Theorem 7 For the multiple access channel, the maximum possible sum capacity improvement 
Cq — C s due to the availability of receiver side information is bounded by the amount of the side 
information itself. 

CB-C*< H{Q) = Urn L H (G U G2, ■■■ , G N ). 
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Proof: 



C§ = sup lim ±-I(WW 2 ;Y N ,G N ] 



N nxr.w N^oo N 



p{X llN {W^,X^{W 2 )) 

sup lim L( I (w uW2 .y n ) + I(W 1 ,W 2 - 1 G n \Y n )) 

P (x 1AN ( Wl ),x^(w 2 )) N ^^ 

= C S + AC E 

where Cq is the sum capacity with the side information provided by the genie, C s is the sum 
capacity without the side information and AC, the capacity improvement, is bounded by the 
entropy rate H(Q). Thus, the single user capacity result extends directly to the multiple access 
sum capacity. If the genie provides one bit of side information to the common receiver per channel 
use, the sum capacity benefit AC = Cq — C can not be more than 1 bit, regardless of the kind of 
side information. ■ 



6.2 Advantage of Non-causal Side Information over Causal Side Information 

We compare the causal and non-causal capacity regions in terms of the sum rate point Cs- Similar 
to the single user case, we have shown that the sum capacities (and the entire capacity regions) are 
identical when Sti,St2 (independent) are also available to the receiver. To make this information 
available to the receiver requires H(Sti, St2\Sr) genie bits per symbol. And because we have 
shown for the multiple access channel that genie bits can not improve capacity by more than their 
own entropy, we have the following result: 

Theorem 8 For the multiple access channel with independent side information at the transmitters, 
the sum capacity benefit of non-causal side information over causal side information is bounded as 
follows: 

C non-causal ( Q Q \ s~1causal( n Q \ ^ TJlQ Q \Q \ 

Proof: 

L-S^Tl, <->T2, &T1, &T2)) ■> \pT\,&T2,&R) 

> G s (OTl, &T2, &R) 

> Ce(<Sti, St2, (Sr, Sti, St2)) — H(Sti, St2\Sr). 
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6.3 Example: Random Access Channel with Multiple Users 



The following example presents a scenario with correlated side information and shows how the re- 
sults may be applicable in several such cases as well. Consider the random access channel described 
before, except now the channel input is controlled by two users as: 

X = X 1 S + X 2 (1-S). 

Thus, a scheduler randomly allows user 1 or user 2 to access the channel in bursts of iV symbols. As 



User 1 



X X 



X 



•■4 



P(Y\X) 



Y 



X 2 



User 2 



Figure 4: Random Access Channel with Two Users 

before, the switch state S is known to the transmitters but not to the receiver. If a genie provides 
S to the receiver, the sum capacity is Co- Using the same arguments as in the single user example, 
we have: 



C = Cx(S,S,S) 

\> ^Jnon-causal 



> 



G causal 



(S, S, <f>) 



> C E (5, S, S) - -H(S) = C - l/N. 

The practical implications of this example are quite interesting. In practice, random access is han- 
dled by the medium access layer (MAC layer) through explicit handshakes between the transmitter 
and receiver in the form of RTS-CTS (request to send, clear to send) messages. For a rapidly 
varying random access channel such exchanges can constitute a major overhead. Practical wire- 
less systems such as the 802.11 Wireless LANs use nearly half the resources just for MAC layer 
overheads. However, the capacity results presented above show that even in the extremely rapidly 
varying random access channels where a transmission may or may not be made each symbol period 
without the knowledge of the receiver, the capacity loss is limited to less than a bit per channel use. 
For random access that fluctuates at a less rapid scale, the loss is negligible. This has interesting 
implications in how the RTS-CTS overhead can be minimized in practical systems through coding 
across access attempts. 
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7 The Broadcast Channel with Side Information 



In the general broadcast channel, the transmitter has side information St and the two receivers 
have side information Sri and Sr 2 . The capacity region of the general broadcast channel is not 
known even without side information. The best known achievable region is due to Marton [27] 
with auxiliary variables U, V and W . When the transmitter has causal side information, Marton's 
achievable region is directly extended to the convex hull of rate pairs -R2) satisfying: 

Ri < i(w,U;Y u s R1 ) = i(w,U;Y 1 \s R1 ) 

R2 < I(W,V;Y 2 ,Sr 2 ) = I(W,V;Y 2 \Sr 2 ) 
^1 + ^2 < mm{I{W; Y u S Rl ) + I(W; Y 2 , S R2 )} + I(U; Y u S R1 \W) + I(V; Y 2 , S R2 \W) - I(U; V\W) 
= mm{I(W; Y 1 \S R1 ) + I(W; Y 2 \S R2 )} + I{U; Y^Sm, W) + I(V; Y 2 \S R2 , W) - I(U; V\W) 

where P(U,V,W, X\S T ) = P{U,V,W)P{X\S T )- Note that the only difference in the achievable 
region with causal side information versus Marton's innerbound without causal side information 
is that with causal side information the mapping from the auxiliary random variables U, V, W 
to the channel input alphabet X depends on the current value of the side information St, i-e. 
X = f(U,V,W, St)- In both cases U,V,W are independent of the side information. As usual 
the achievability proof for causal side information is straightforward as the mapping /(.) can be 
incorporated into the channel to obtain a case with no side information where Marton's innerbound 
applies. 

Extension of Marton's innerbound to case where non-causal side information is available at 
the transmitter is less straightforward. A simple and elegant proof of achievability of a subset of 
Marton's innerbound with only the auxiliary random variables U, V was provided by El Gamal 
and Van Der Meulen in [28] 2 . The achievable region with only U,V is the set of rates (R\,R 2 ) 
satisfying 

Ri < HUM) 
R2 < I(V;Y 2 ) 
R1+R2 < I(U;Y{) + I{V;Y2)-I(U;V) 

for all P{U,V,X). 

El Gamal and van der Meulen's simple achievability proof was extended by the author in [29] 

2 While their proof directly addresses only the random variables U, V it can be extended to include W and hence 
the entire Marton's innerbound region 
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to incorporate non-causal side information as follows: 



R 1 < IiU-Y^Sm) -I(U; S T ) 
R2 < I(V;Y 2 ,S R2 )-I{V;S T ) 
R1 + R2 < I(U;Y 1 ,S R1 ) + I(V;Y 2 ,S R2 )-I(U;V)-I(U,V;S T ) 

for all P(U, V, X\S T ) = P(U, V)P(X\U, V, St). While a full extension of Marten's innerbound with 
side information has been found recently, the proof of the abovementioned limited extension is inter- 
esting for its simplicity and also illustrates the binning concept central to the full extension. Since, 
the same concept can be applied to obtain achievable regions for many multiuser communication 
scenarios with non-causal side information [29] we include a sketch of this proof here. 

[Sketch of Proof:] Proceeding as in [28], 2'< I(U ' Yl ' S ^~ e ^ (resp. 2 n(I ^ V ' Y2 ' S ^-^) i.i.d. U se- 
quences (resp. V sequences) are generated independently according to P(U) (resp. P{V)) and 
uniformly distributed over 2 nRl (resp. 2 nR2 ) bins. This forms the codebook that is shared by all 
parties prior to beginning of communication. During the communication phase, the message index 
Wi G [l,2 ni?1 ] (resp. W 2 <G [l^™^ 2 ]) marks the appropriate U (resp. V") sequence bin selected. 
Given the state sequence S T , the transmitter finds a U sequence and a V sequence in the chosen bins 
so that U n , V n , S n are jointly typical. The probability that independently generated U n (resp. V n ) 
and S n are jointly typical is bounded by 2-<^ u ^- & ^ (resp. 2- n( -^ v ^- s ^). This is reflected 
in the individual rate constraints. Finally, the probability that independently generated U n , V n and 
S n are jointly typical is bounded by 2 -™(^)+^)+^)-"(^))-^) = 2-»(AiW+W;S)M( e ) 
which results in the sum rate constraint. ■ 

The complete extension of Marton's innerbound extension to non-causal side information was 
obtained in an independent and parallel work by Steinberg and Shamai [30]. With non-causal 
side information at the transmitter, Marton's innerbound becomes the convex hull of all rate pairs 
(Ri,R 2 ) satisfying [30]: 

Ri < I(W,U;Y U S R1 ) - I{W,U;S T ) 
R2 < I{W,V-Y 2 ,S R2 )-I(W,V-S T ) 
R1+R2 < -[ma X {I{W-Y 1 ,S R1 ),I(W;Y 2 ,S R2 )}-I(W;S T )]+ 

+I(W, U; Y 1 ,S m ) - I(W, U- S T ) + I(W, V; Y 2 , S R2 ) - I(W, V; S T ) - I(U; V\W, S T ) 

for all P(U,V,W,X\S T ). 
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7.1 Genie Bounds and the Relative Advantage of Non-causal Side Information 
over Causal Side Information 

Since the capacity region is not known for the general broadcast channel, there are two interesting 
alternatives to consider. As a first alternative, we could limit our attention to the special cases 
where the capacity region is known, such as the degraded broadcast channel. However, since our 
interest is to bound the sum capacity improvement with causal and non-causal side information, the 
degraded case becomes trivial. For the degraded broadcast channel the sum capacity is the single 
user capacity of the stronger user, and therefore all the single user results obtained in previous 
sections apply. 

Another possibility is to limit our attention to Marton's innerbound region instead of the actual 
capacity region. In that case, the natural question is whether the extensions of Marton's innerbound 
for causal and non-causal side information become identical when the side information at the 
transmitter is also available at both the receivers. While this result was shown to be true for the 
general multiple access channel, it is interesting that the corresponding extension for the broadcast 
channel turns out to be less general. 

Observation 1 The extensions of Marton's innerbound with causal and non-causal side informa- 
tion ( as described above ) are identical if the following condition is satisfied: 

I(U; St\W) + I{V- St\W) = I(U, V; S T \W) 

Intuitively, one can interpret the condition as follows. The common state information is provided 
by W and therefore conditioned on W, the auxiliary random variables U and V should provide 
independent information about the state St- It is not known if this condition actually makes the 
achievable region smaller. A sketch of the proof of this observation is provided in the Appendix. 

Following our previous results, the next question is to bound the maximum capacity benefit of 
genie bits. A straightforward extension of the single user result leads to the following bound: If a 
genie provides G\ bits of side information to receiver 1 and Gi bits of side information to receiver 
2, the sum capacity can not improve by more than G\ + G2 bits. The proof is straightforward since 
the improvement in the individual rates R\, R2 is bounded by G\, G2 respectively. 

7.2 Example: Broadcast Channel 

Consider a two user fading broadcast channel where the users' channels experience i.i.d. block 
fading (block size N) given by h\ and /12 respectively. 

Yi = hiX + m 
Y 2 = h 2 X + n 2 
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where ni,ni ~ A/"(0, 1) are additive white Gaussian noise terms and a transmit power constraint 
of P is in place. It is assumed that each user knows only his own channel, while the transmitter 
knows only the side information variable St (St = 1 if h\ > hi and otherwise). While in a 
degraded broadcast channel it is well known that the sum rate maximizing policy is to transmit 
only to the strongest user, the problem faced here is that a receiver does not know when it has 
the stronger channel compared to the other receiver's channel. Clearly, if a genie provides St to 
both receivers then the sum capacity of this channel is Co = C^(St, (hi, St), (hi, St)) = Elog(l + 
Pmax(|/ii| 2 , |/i2| 2 ))- Since the genie's bits can not increase capacity by more than 2 bits (one bit 
to each receiver), the sum capacity (with causal or non-causal side information) can be bounded 
as: 

C > CI?— 1 (S T , h x ,h 2 )> CI?- 1 (S T , h 1 ,h 2 )>C - 2/N. 
8 The Relay Channel 

The discrete memoryless relay channel with side information consists of source input alphabet 
Xs, relay input alphabet Xr, state alphabet Ss,Sr,Sd at the source (and relay and destination 
(respectively), relay output alphabet y^, the destination output alphabet yry, and a probability 
transition function P(Yr, Yd\Xs, Xr, Ss, Sr, Sd). 

Our first goal is to extend the result of Theorem ^ to the relay channel. For the general 
relay channel, the capacity is unknown. For the special case of the degraded relay channel with 
no side information the capacity was found by [31]. In recent work, Sigurjonsson and Kim [23] 
combine Shannon's proof technique [1] with Cover and El Gamal's [31] relay coding theorems to 
determine the capacity of the degraded relay channel with causal side information for the special case 
Sr = Ss = S, i.e. the same state information is available to both the source and the relay. From 
the perspective of Theorem ^ we are interested in the scenario where the side information available 
to the source and relay is also available to the destination. Mathematically, we define the physically 
degraded channel with (causal or non-causal) side information where Sr = Ss = S = f(So) as 
follows: 

P(Y d ,Yr\X s ,X r ,S,S d ) = P(Yr\Xs,Xr,S)P(Y d \Yr,Xr,S d ). (21) 

Figure 03 illustrates the relay channel with side information. The following theorem extends the 
result of Theorem ^ to the relay channel. 

Theorem 9 The capacity of the physically degraded relay channel described above with causal or 
non-causal side information is given by: 

C causal = C~ l = max min [I(X S , X R ; Y D \S D ), I(X S ; Y R \X R , S)} 

P(X S ,X R \S) 
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Figure 5: Relay channel with (a) Causal and (b) Non-causal Side Information 

Proof: : [Achievability] Achievability follows directly from the idea of multiplexed codebooks. 
Separate codebooks are designed for each possible state S. As the states are revealed in a causal 
fashion, the transmitter and the relay switch between the corresponding codebooks. Since the 
receiver also knows S (a deterministic function of Sp), the receiver makes the corresponding switch 
as well. In this manner, for each state S = s, the capacity 

C s = max min [I(X S , X R ;Y D \S D , S = s), I{X S ; Y R \X R , S = s)] 
P(Xs,X R \S=s) 

is achieved. Averaging over S we obtain the capacity expression of Theorem|§J Note that codebook 
multiplexing requires only causal side information. What is achievable with causal side information 
is also achievable with non-causal side information. ■ 
Proof: : [Converse] 

I(W;YB,SZ) < I(W;YS,Y2\S%) 

n 

= Y. H ( Y D^YRAY l D ~\Y l R -\S n D ) - H{Y D<l ,Y R ^\Y D ~\Y R -\S n D ,W) 
i=i 



= H( y^ Y rA Y d\ Y r\ Sl,X Rtl ) - H(Y D>i , Y Rtl \Y l D '\Y R -\S n D , W, X R>i ,X s , 
i=l 

n 

< H (Ypj , Y R>i | Xjj t j ,Sp } j) — H(Y£>^,Y Ri i\SD,iX R: i, Xs,i) 

i=l 

n 

= I{Xs,u Yp t j, Yjj t j\X Rt i, Sp t i) 



i=l 



Using Fano's inequality and the time sharing variable to get a single letter characterization we get 

R < I(X s ;Y D ,Y R \X R ,S D ) + e n 

= I(X S ; Y R \X R , S D ) + I(X S ; Y D \X R , S D , Y R ) + e n 
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However, conditioned on Sd, the degraded channel condition implies that Xs, (Yr, Xr), Yd is a 
Markov chain, i.e. I(Xs; Yd\Xr, Sd, Yr) = and we have, 

R < I(Xs;Y R \XR,S D ) + e n 

= H(Yr\X r , S d , S) - H(Y R \X R , S d ,X s , S) + e n 
< H(YR\XR,S)-H(Y R \XR,Xs,S) + e n 
= I(X s ;Y R \X R ,S) + e n 

Similarly, we have 

n 

I(W;YB,S n D ) = ^H(Y D4 \Yfr\Sb)-H(Y D j\YZ- 1 ,S?,,W) (22) 
i=i 

n 

< YsH^ASD^-HiYDAY^^Sl^WXR 1 ) (23) 
i=i 

n 

= Y, H ( Y DA S D^)-H{Y D AYD\ sn DW Y R l ' X s^XR,i) (24) 
1=1 

n 

= Y, H{ yDASD,i)-H{YDAS D ,i,Xs,i,XR^) (25) 

i=l 
n 

= Y, I( - X ^ X R^ Y D,i\S D ,i) (26) 
i=i 

Combining ((2*61 and (|2*2*|) with Fano's inequality we have, 

R < min [I{X S ,X R - Y d \S d ), I(X S ;Y R \X R , S)} + ~e n 
and the converse proof is complete. ■ 

8.1 Genie Bound and the Relative Advantage of Non-causal Side Information 
over Causal Side Information 

For the relay channel with only one destination, the result of Theorem [2 applies with the identical 
proof. In other words, one bit of genie information at the receiver does not improve capacity by 
more than one bit. Combining with the result of Theorem U3 we have the following result: For the 
physically degraded relay channel with side information defined by (|21jl (in general without the 
constraint S = /(Sd)) the difference between capacity with non-causal and causal side information 
at the transmitter is bounded as: C non - ca * US!L 1 - C causal < H(S\S D )- 
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9 Conclusion 



Side information is a crucial factor in determining the capacity of a channel. Previous research has 
shown that for a single user memoryless channel even perfect feedback does not result in a capacity 
advantage. However, we find that even one bit of causal side information at the transmitter can 
increase capacity by an unbounded amount. The contrasting results are due to the fact that perfect 
feedback can only provide information about past channel states, whereas causal side information 
can provide information about the current channel state. Thus, for the transmitter the difference 
between the past state information and present state information is very significant. We further 
explore the relative advantage of knowing the future channel states (non-causal side information) 
relative to the knowledge of only the present channel state (causal side information). For a single 
user, we find that the knowledge of future channel states can increase capacity only if the state 
information is not available to the receiver. In other words, non-causal side information has no 
capacity benefit over causal side information when the side information is available to the receiver 
as well. Note that for the receiver it does not matter if the side information is made available 
causally or non-causally, because no delay constraint is assumed in the decoding operation for 
capacity results. We evaluate the benefits of receiver side information in the form of a genie bound, 
which states quite simply that the capacity advantage from one genie bit of side information at 
the receiver can not exceed one bit. This gives us a bound on the capacity benefit of non-causal 
side information at the transmitter over causal side information in the form of the number of bits 
required to inform the receiver of the transmitter side information that is not already available to 
the receiver. 

We follow single user results with multiuser scenarios. All the results are found to extend to the 
general multiple access channel as well. For the broadcast channel, the extension is less general. In 
particular, for the broadcast channel it is not clear (even in terms of Marton's innerbound) if there 
is any benefit of non-causal side information over causal side information when both receivers know 
all the transmitter side information. This equivalence is established only subject to an interesting 
constraint. Finally, we show that the single user results extend to the degraded relay channel. 
Examples of random access channels are provided throughout as illustrations of the capacity bounds. 
While the MAC layer overheads in practical systems can be excessive the capacity results show that 
such overheads can be nearly eliminated through clever coding across multiple access blocks. 



Appendix 
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A Sketch of Proof of Observation [T] 

When St = fi(Sm) = f2(S R2 ), the following simplifications can be made, as in the proof of 
Theorem ^ 

I(W,U;Y 1 ,S R1 )-I(W,U;S T ) = I^U^Sm). 
I(W,V;Y 2 ,S R2 )-I(W,V;S T ) = I(W,V;Y 2 \S R2 ). 
max{/(^; Y 1} S R1 ), I(W; Y 2 , S R2 )} - I(W, U; S T ) = max{/(TU; Y 1 \S m ),I(W; Y 2 \S R2 )} 

and finally 

- max{/(^; Y X \S R1 ),I{W\ Y 2 \S R2 )) + I(W, U; Y^Sm) + I(W, V; Y 2 \S R2 ) 
= mm{I(W; Y^Sm) + I{W; Y 2 \S R2 )} + I(U; Y x \S R1 ,W) + I(V; Y 2 \S R2 , W) 

Using the simplifications it is easy to see that the causal case and the non-causal case are 
identical if I(U; V\W, S) = I(U; V\W). But, 

I(U; V\W, S) = I(U; V\W) + I(U, V; S\W) - I(U; S\W) - I(V; S\W) 

and thus we have the condition of observation^ ■ 
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